Search CORE

133 research outputs found

Exploring concept representations for concept drift detection

Author: Becher O.L. (Oliver)
Elliott D. (Desmond)
Hollink L. (Laura)
Publication venue
Publication date: 11/09/2017
Field of study

We present an approach to estimating concept drift in online news. Our method is to construct temporal concept vectors from topicannotated news articles, and to correlate the distance between the temporal concept vectors with edits to the Wikipedia entries of the concepts. We find improvements in the correlation when we split the news articles based on the amount of articles mentioning a concept, instead of calendar-based units of time

CWI's Institutional Repository

Bias in the analysis of multilingual legislative speech

Author: Aggelen A.E. (Astrid) van
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Publication venue
Publication date: 04/07/2017
Field of study

In this paper we investigate the application of natural language processing tools to the multilingual proceedings of the European Parliament. This work is part of a study in which we explore (1) how subcorpora in different languages may lead to different conclusions about the political landscape, (2) how to determine what a potential language-related bias originates from, and (3) to what extent we can limit or even prevent an unwanted language-bias

CWI's Institutional Repository

A corpus of images and text in online news

Author: Bedjeti A. (Adriatik)
Elliott D. (Desmond)
Harmelen M. van
Hollink L. (Laura)
Publication venue
Publication date: 23/05/2016
Field of study

In recent years, several datasets have been released that include images and text, giving impulse to new methods that combine natural language processing and computer vision. However, there is a need for datasets of images in their natural textual context. The ION corpus contains 300K news articles published between August 2014 - 2015 in five online newspapers from two countries. The 1-year coverage over multiple publishers ensures a broad scope in terms of topics, image quality and editorial viewpoints. The corpus consists of JSON-LD files with the following data about each article: the original URL of the article on the news publisher’s website, the date of publication, the headline of the article, the URL of the image displayed with the article (if any), and the caption of that image. Neither the article text nor the images themselves are included in the corpus. Instead, the images are distributed as high-dimensional feature vectors extracted from a Convolutional Neural Network, anticipating their use in computer vision tasks. The article text is represented as a list of automatically generated entity and topic annotations in the form of Wikipedia/DBpedia pages. This facilitates the selection of subsets of the corpus for separate analysis or evaluation

CWI's Institutional Repository

SWISH DataLab: A Web Interface for Data Exploration and Analysis

Author: Bogaard T. (Tessel)
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Wielemaker J. (Jan)
Publication venue
Publication date: 08/12/2017
Field of study

SWISH DataLab is a single integrated collaborative environment for data processing, exploration and analysis combining Prolog and R. The web interface makes it possible to share the data, the code of all processing steps and the results among researchers; and a versioning system facilitates reproducibility of the research at any chosen point. Using search logs from the National Library of the Netherlands combined with the collection content metadata, we demonstrate how to use SWISH DataLab for all stages of data analysis, using Prolog predicates, graph visualizations, and R

CWI's Institutional Repository

The Portrait of a Lady: Close and distant reading of media gender bias

Author: Aggelen A.E. (Astrid) van
Hollink L. (Laura)
Publication venue
Publication date: 11/09/2020
Field of study

CWI's Institutional Repository

Interchanging lexical resources on the Semantic Web

Author: Aguado de Cea G.
Buitelaar Paul
Cimiano Philipp
Declerck Thierry
Gracia Jorge
Gómez-Pérez A.
Hollink Laura
McCrae J.
Montiel-Ponsoda Elena
Spohr Dennis
Wunner Tobias
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in ‘‘data silos’’ and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gap

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Publications at Bielefeld University

Access to Research at National University of Ireland, Galway

Archivo Digital UPM

A larger-scale evaluation resource of terms and their shift direction for diachronic lexical semantics

Author: Aggelen A.E. (Astrid) van
Fokkens A. (Antske)
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Publication venue
Publication date: 30/09/2019
Field of study

Determining how words have changed their meaning is an important topic in Natural Language Processing. However, evaluations of methods to characterise such change have been limited to small, handcrafted resources. We introduce an English evaluation set which is larger, more varied, and more realistic than seen to date, with terms derived from a historical thesaurus. Moreover, the dataset is unique in that it represents change as a shift from the term of interest to a WordNet synset. Using the synset lemmas, we can use this set to evaluate (standard) methods that detect change between word pairs, as well as (adapted) methods that detect the change between a term and a sense overall. We show that performance on the new data set is much lower than earlier reported findings, setting a new standard

CWI's Institutional Repository